In this exercise you will explore estimating the relationship between two camera views using the epipolar geometry concepts learned in the lectures.
NOTE: When submitting the Colab notebook, please remove any unnecessary/debug cells. That is, your notebook should be clean to make it easy for the grader to follow.
Student 1:
Name: Rony Kositsky
ID: 205817893
Student 2:
Name: Yonatan Gartenberg
ID: 311126205
import numpy as np
import cv2
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
%matplotlib inline
# utility function for reading the 2d points from the given txt files
def load_file(name):
lst = []
f = open(name, 'r')
for line in f:
lst.append(line.strip().split())
f.close()
return np.array(lst, dtype=np.float32)
from google.colab import drive
drive.mount("/content/drive", force_remount=True)
Recall that we are interested in solving the correspondence problem. Specifically, given two images taken from cameras at different positions. How do we match a point in the first image to a point in the second image?
We have seen that it is possible to learn a mapping of points in one image to lines in another image using the Fundamental Matrix.
To do so, we have provided you with two scenes.
sceneA, we have provided you with corresponding point locations listed in sceneA-pts2d-1.txt and sceneA-pts2d-2.txt. sceneB, we ask you to compute the corresponding points yourself. Please use the provided utility script tag_image.py to do so. Hint: When solving for the fundamental matrix , think about how many correpsonding points we need. What happens if we have more pairs than required?
Write a function that computes the Fundamental Matrix that satisfies the epipolar constraints defined by the sets of corresponding points using least squares.
def calculate_fundamental_matrix(points_a, points_b, number_of_rows):
A = np.empty((num_rows, 9))
for i in range(num_rows):
A[i] = np.kron(np.append(points_b[i],1), np.append(points_a[i],1))
(U, S, V) = np.linalg.svd(A)
V = V.conj().T;
return V[:,8].reshape(3,3).copy()
# Compute the Fundamental Matrix for Scene A
pts_a_a = load_file('/content/drive/My Drive/Colab Notebooks/input/sceneA/sceneA-pts2d-1.txt')
pts_b_a = load_file('/content/drive/My Drive/Colab Notebooks/input/sceneA/sceneA-pts2d-2.txt')
num_rows, num_cols = pts_a_a.shape
F_sceneA = calculate_fundamental_matrix(pts_a_a, pts_b_a, num_rows)
print(f'Scene A: Fundametal Matrix with Rank = 3: \n {F_sceneA} \n')
Recall that is only known up to a scale factor. To make sure that your code is correct, if you solved for correctly, you should get a matrix that is a scaled equivalent to the following for sceneA:
F = [[ 2.67713722e-07 -2.57763422e-06 1.12898577e-03]
[ 7.43898031e-06 3.90196194e-06 7.31449001e-03]
[-2.43366396e-03 -1.14515247e-02 1.00000000e+00]]
pts_a_b = load_file('/content/drive/My Drive/Colab Notebooks/input/sceneB/our_pts1.txt')
pts_b_b = load_file('/content/drive/My Drive/Colab Notebooks/input/sceneB/our_pts2.txt')
num_rows, num_cols = pts_a_b.shape
F_sceneB = calculate_fundamental_matrix(pts_a_b, pts_b_b, num_rows)
print(f'Scene B: Fundametal Matrix with Rank = 3: \n {F_sceneB} \n')
Question: Notice that the least squares estimate of is of full rank. However, the Fundamental Matrix should be a matrix of rank . Why is the fundamental matrix of rank ? (We require you to show only the upper bound. That is, that is not of full rank).
Answer: We calculated F from the equation . We can conclude that the right epipolar line lies on the surface of , and therefore, it holds that . From simetry we can prove the same equation for the left epipolar line . We can deduce that F has a non-zero eigenvector corresponding with a zero eigenvalue.Therefore, F's rank <= F's dimesion -1. We used F with a dimension=3, so the upper bound for the rank is 2.
From the proof you provided above, should be of rank 2. Therefore, we must reduce the rank of the estimated . To do so, we can decompose using singular value decomposition (SVD) into the matrices . We can then estimate a rank 2 matrix by setting the smallest singular value in to zero thus generating . The Fundamental Matrix is then easily calculated as .
Compute the fundamental matrix of rank 2 using SVD as described above.
def reshape_to_rank_two(A):
U,S,V = np.linalg.svd(A)
min_value = np.argmin(S)
S[min_value] = 0
return np.dot(U*S,V)
F_sceneA_rank_2 = reshape_to_rank_two(F_sceneA)
print(f'Scene A: Fundametal Matrix with Rank = 2: \n {F_sceneA_rank_2} \n')
F_sceneB_rank_2 = reshape_to_rank_two(F_sceneB)
print(f'Scene A: Fundametal Matrix with Rank = 2: \n {F_sceneB_rank_2} \n')
Given our fundamental matrix of rank 2, we can now compute the epipolar line in image corresponding to a point in image :
A similar equation can be used for computing the epipolar lines in image .
You are now tasked with drawing these epipolar lines.
Observe that drawing the epipolar lines on the image plane is not trivial. This is because the lines are defined in homogeneous coordinates whereas the standard line function takes two input points.
In order to find two such points, we can find the intersection of a given line with the image boundaries.
To do so, we need to compute two things:
Once, we have these lines, we can compute the intersection point between and and intersection point between and .
Once we have these two points, we can compute the line running through the points and .
To summarize, in this question you are required to compute the following:
After computing these lines, you should draw the estimated epipolar lines on each of the two images. For each scene, please display the two images side-by-side with the epipolar lines drawn on each.
Hint: The intersection between two lines and is a point in homogeneus coordinates s.t. and , that is a point which is perpendicular to both and .
def draw_lines(points, F, width, image):
for point in points:
point = np.append(point, 1)
dot = np.dot(F, point)
a,b,c = dot[0], dot[1], dot[2]
left = int(-c/b)
right = int(-(a*width+c)/b)
left_point, right_point = (0, left), (width, right)
cv2.line(image, left_point, right_point, (0,100,0), 2)
# Draw the epipolar lines on the images of Scene A and display + save the images side-by-side
image_one = cv2.imread('/content/drive/My Drive/Colab Notebooks/input/sceneA/sceneA-im-1.png')
image_two = cv2.imread('/content/drive/My Drive/Colab Notebooks/input/sceneA/sceneA-im-2.png')
image_one_height, image_one_width, null = image_one.shape
image_two_height, image_two_width, null = image_two.shape
draw_lines(pts_a_a, F_sceneA_rank_2, image_one_width, image_one)
draw_lines(pts_b_a, F_sceneA_rank_2, image_two_width, image_two)
fig, ax = plt.subplots(1,2,figsize=(15,15))
ax[0].imshow(image_one)
ax[1].imshow(image_two)
plt.show()
# Draw the epipolar lines on the images of Scene B and display + save the images side-by-side
image_one = cv2.imread('/content/drive/My Drive/Colab Notebooks/input/sceneB/sceneB-im-1.png')
image_two = cv2.imread('/content/drive/My Drive/Colab Notebooks/input/sceneB/sceneB-im-2.png')
image_one_height, image_one_width, null = image_one.shape
image_two_height, image_two_width, null = image_two.shape
draw_lines(pts_a_b, F_sceneB_rank_2, image_one_width, image_one)
draw_lines(pts_b_b, F_sceneB_rank_2, image_two_width, image_two)
fig, ax = plt.subplots(1,2, figsize=(15,15))
ax[0].imshow(image_one)
ax[1].imshow(image_two)
plt.show()
Take a look at the results of the last section. You may notice that the epipolar lines are not exact.
The problem here is that the offset and scale of the points is large and biased compared to some of the constants. To fix this, we can normalize the points.
Specifically, we want to construct a transformation that will make the mean of the points and scale the points to magnitude . In other words, we wish to find: $$ \begin{equation*} \begin{pmatrix} {u'} \ {v'} \ {1} \
\end{equation*} $$
The transform matrix is the product of the scale and offset matrices. Notice that are simply the mean of the points. To compute the scale you can first estimate the standard deviation after subtracting the means of the points. Then, can be defined as the reciprocal of the standard deviation.
Create two matrices and for the set of points defined in the files sceneB-pts2d-1.txt and sceneB-pts2d-2.txt, respectively.
Then, normalize the two sets of points to compute the new normalized fundamental matrix .
Note: make sure that is of rank !
(You only need to complete this for sceneB using the points provided to you)
# Complete the bonus. Compute the normalized Fundamental Matrix using the points given in `sceneB-pts2d-1.txt` and `sceneB-pts2d-2.txt` for `sceneB`.
# Then visualize the epipolar lines obtained with the matrix you got.
####################
## YOUR CODE HERE
####################
Up to now, we've only discussed how one can compute the epipolar lines between two images.
In this section, we use a window-based approach for estimating dense stereo correspondence.
For this part, we will be working with the images named corr-img-l.png and corr-img-r.png found in input/correspondences.
Here, you will implement the stereo alogrithm and take a window around every pixel in one image and search for the best match in the other image. This concept is discussed in the class lecture (~slides 58-68).
The basic algorithm is defined as follows:
To help simplify the problem, we will assume that the two images are aligned horizontally. Therefore, when performing the matching, you only need to scan across the same horizontal scan line of the other image.
When computing the similarities, we will use the sum of squared differences (SSD) measure.
Implement the SSD matching algorithm. The algorithm returns a disparity image where
Then, repeat this for matching from the right image to the left image. That is, you should output two images:
Then, change the values of:
How does changing these values affect the results? Please clearly display and label the results in the notebook with the two disparity maps shown side-by-side.
Some notes to consider:
cv2.matchTemplate and cv2.TM_SQDIFF_NORMED.cv2.pyrDown().When displaying the images, please display them in grayscale!
Hint: To verify that your implementation is correct, think about the values you should obtain closer to the camera. Larger values should be shown in white with lower values shown in black.
L = cv2.imread('/content/drive/My Drive/Colab Notebooks/input/correspondences/corr-img-l.png', cv2.IMREAD_GRAYSCALE)
R = cv2.imread('/content/drive/My Drive/Colab Notebooks/input/correspondences/corr-img-r.png', cv2.IMREAD_GRAYSCALE)
L_org = L * (1.0 / 255.0)
R_org = R * (1.0 / 255.0)
After computing the disparity maps, you should display the results side-by-side, which can be done using the follow cell as a reference. Please note that when displaying the results you should clearly specify the parameters used to obtain the results!
plt.figure(figsize=(16, 16))
plt.subplot(1, 2, 1)
plt.imshow(L, cmap='gray')
plt.subplot(1, 2, 2)
plt.imshow(R, cmap='gray')
def findMin(a, b):
if a < b:
return a
else:
return b
def findMax(a, b):
if a < b:
return b
else:
return a
# Go over every pixel in the image, and for each pixel:
# Determain it's range boundries (avoid size overflow)
# take the sub image as the "picture" to compare
# compare with cv2.matchtemplate(cv2.cv2.TM_SQDIFF_NORMED)
# use cv2.minMaxLoc on the result to obtain the location of the best match found
# the disparity is the distance between the original x position and the found x position
def stereo_match(left_img, right_img, window_size, disparity_range, compare_func = cv2.TM_SQDIFF_NORMED):
#global min, max
height, width = left_img.shape
disparity_map = np.zeros((height, width))
half_window_size = int(window_size/2)
half_disparity_range = int(disparity_range/2)
for x in range(height):
row_top_border = findMin(x + half_window_size, height-1)
row_bottom_border = findMax(x - half_window_size, 0)
for y in range(width):
#make sure to stay in picture's boundries
col_top_border = findMin(y + half_window_size, width-1)
col_bottom_border = findMax(y - half_window_size, 0)
source_window = left_img[row_bottom_border:row_top_border, col_bottom_border:col_top_border]
#column indices for disparity range (row stays the same)
begin = findMax(y - half_disparity_range, 0)
end = findMin(y + half_disparity_range, width-1)
dest_window = right_img[row_bottom_border:row_top_border, begin:end] #"cut" the destination window we'd like to examine
match_result = cv2.matchTemplate(dest_window, source_window, compare_func)
min_satum, max_satum, min_point, max_point = cv2.minMaxLoc(match_result)
if compare_func == cv2.TM_CCORR_NORMED:
point = max_point
else:
point = min_point
x_offset = begin + point[0]
result = abs(x_offset - y)
disparity_map[x][y] = result
return disparity_map
# Compute the disparity maps for multiple values of window size and maximum disparity
left = cv2.pyrDown(L)
right = cv2.pyrDown(R)
left_to_right = stereo_match(L, R, 6, 250)
right_to_left = stereo_match(R, L, 6, 250)
plt.figure(figsize=(16, 16))
plt.subplot(1, 2, 1)
plt.imshow(left_to_right, cmap='gray')
plt.subplot(1, 2, 2)
plt.imshow(right_to_left, cmap='gray')
Question: How does changing the value of the window size affect the results? How does changing the value of the maximum disparity range affect the results?
Answer: After exploring a vast range of otions we've come to a conclusion that the window size and the disparity range are linked.
Separately, keeping the disparity range at 250, if we use very low values of the window size (around ~2), the result contains a lot of small noise, yet the details of the picture remain "sharp". If we use high values of the window size (around ~15), the result is less noisy, but the details appear to be blunt.
Regarding the disparity range, keeping the window size at 6 - If we use very low values (around ~120), there is also noise, and the smaller details of the pictures are distorted, where the bigger ones are more "in tact". If we use very high values (around ~300), the details are clearer, but the differences that suppose to be between close and far objects, are very small to distinct (i.e. the gray hues are very similar)
We conclude there is a "sweet spot" of the two. In oure case we achived it around window size = 6, and disparity range = 250
Above we used SSD to compute the correspondences. However, SSD is not robust to small changes in the image.
Here, we will explore the sensitivity of SSD to small changes in the images. Specifically, repeat Question 1, but with the following changes:
Display the results of both of the variations. What happened to the results?
# Demonstrate the sensitivity of the SSD matching algorithm as explained above
#Generating noise
noise1 = np.uint8(np.random.normal(0,20, L.shape))
noise_left = noise1 + L
noise2 = np.uint8(np.random.normal(0,20, R_org.shape))
noise_right = noise2 + R
fig, ax = plt.subplots(2, 2)
ax[0, 0].imshow(noise1, cmap='gray')
ax[0, 1].imshow(noise_left, cmap='gray')
ax[1, 0].imshow(noise2, cmap='gray')
ax[1, 1].imshow(noise_right, cmap='gray')
ax[0, 0].set_title("First Noise")
ax[0, 1].set_title("Noisy Picture Left")
ax[1, 0].set_title("Second Noise")
ax[1, 1].set_title("Noisy Picture Right")
fig.set_figheight(13)
fig.set_figwidth(13)
plt.show()
left_to_right_noise = stereo_match(L, noise_right, 6, 250)
right_to_left_noise = stereo_match(R, noise_left, 6, 250)
#Comparing pictures
fig, ax = plt.subplots(2, 2)
ax[0, 0].imshow(left_to_right, cmap='gray')
ax[0, 1].imshow(left_to_right_noise, cmap='gray')
ax[1, 0].imshow(right_to_left, cmap='gray')
ax[1, 1].imshow(right_to_left_noise, cmap='gray')
ax[0, 0].set_title("Original Left")
ax[0, 1].set_title("Noisy Left")
ax[1, 0].set_title("Original Right")
ax[1, 1].set_title("Noisy Right")
fig.set_figheight(13)
fig.set_figwidth(13)
plt.show()
#Contrast part
contrast_factor = 3
R_contrast = R*contrast_factor
result = stereo_match(L, R_contrast, 6, 250)
plt.figure(figsize=(16, 16))
plt.subplot(1, 2, 1)
plt.imshow(R_contrast, cmap='gray')
plt.subplot(1, 2, 2)
plt.imshow(result, cmap='gray')
Question: Why is SSD sensitive to Gaussian noise? Why is it sensitive to changes in contrast?
Answer: SSD is sensitive to gaussian noise because it "masks" the most of the picture, meaning that now with the noise in place, the measured distance between 2 pixels in 2 "picture windows" is more likely to get false results, because now the minimum distance is less likely to be the same as the one that would be measured withput the noise.
Regarding the contrast, the sensitivity is originated in the fact that only one picture is "contrasted", thus bringing a large amount of pixels to "saturation", turning bright pixels dark, and vice versa. This phenomena makes it nearly impossible for the SSD to find the matching pixel in the other image.
We have now seen that the SSD metric is not robust to small changes between the input images. Therefore, in this section we will compute the correspondences using the Normalized Cross Correlation (NCC) metric introduced in class.
Repeat Questions 1 and 2 where now, NCC is used for computing the similarity between two window patches. As before, you may use cv2.matchTemplate for computing the similarity.
Again, explore different values for the window size and maximum disparity.
Think to yourself: how did the NCC measure affect the results when adding Gaussian noise? How about when changing the contrast? Do your results make sense?
# Repeat Questions 1 and 2 using the NCC metric for computing similarities
####################
## YOUR CODE HERE
####################
#Repeat question 1
left = cv2.pyrDown(L)
right = cv2.pyrDown(R)
left_to_right = stereo_match(L, R, 6, 250, cv2.TM_CCORR_NORMED)
right_to_left = stereo_match(R, L, 6, 250, cv2.TM_CCORR_NORMED)
plt.figure(figsize=(16, 16))
plt.subplot(1, 2, 1)
plt.imshow(left_to_right, cmap='gray')
plt.subplot(1, 2, 2)
plt.imshow(right_to_left, cmap='gray')
#Repeat question 2
noise1 = np.uint8(np.random.normal(0,15, L.shape))
noise_left = noise1 + L
noise2 = np.uint8(np.random.normal(0,15, R_org.shape))
noise_right = noise2 + R
fig, ax = plt.subplots(2, 2)
ax[0, 0].imshow(noise1, cmap='gray')
ax[0, 1].imshow(noise_left, cmap='gray')
ax[1, 0].imshow(noise2, cmap='gray')
ax[1, 1].imshow(noise_right, cmap='gray')
ax[0, 0].set_title("First Noise")
ax[0, 1].set_title("Noisy Picture Left")
ax[1, 0].set_title("Second Noise")
ax[1, 1].set_title("Noisy Picture Right")
fig.set_figheight(13)
fig.set_figwidth(13)
plt.show()
left_to_right_noise = stereo_match(L, noise_right, 6, 250, cv2.TM_CCORR_NORMED)
right_to_left_noise = stereo_match(R, noise_left, 6, 250, cv2.TM_CCORR_NORMED)
#Comparing pictures
fig, ax = plt.subplots(2, 2)
ax[0, 0].imshow(left_to_right, cmap='gray')
ax[0, 1].imshow(left_to_right_noise, cmap='gray')
ax[1, 0].imshow(right_to_left, cmap='gray')
ax[1, 1].imshow(right_to_left_noise, cmap='gray')
ax[0, 0].set_title("Original Left")
ax[0, 1].set_title("Noisy Left")
ax[1, 0].set_title("Original Right")
ax[1, 1].set_title("Noisy Right")
fig.set_figheight(13)
fig.set_figwidth(13)
plt.show()